Same State, Different Task: Continual Reinforcement Learning without Interference
نویسندگان
چکیده
Continual Learning (CL) considers the problem of training an agent sequentially on a set tasks while seeking to retain performance all previous tasks. A key challenge in CL is catastrophic forgetting, which arises when previously mastered task reduced learning new task. While variety methods exist combat some cases are fundamentally incompatible with each other and thus cannot be learnt by single policy. This can occur, reinforcement (RL) may rewarded for achieving different goals from same observation. In this paper we formalize "interference" as distinct forgetting. We show that existing based neural network predictors shared replay buffers fail presence interference. Instead, propose simple method, OWL, address challenge. OWL learns factorized policy, using feature extraction layers, but separate heads, specializing The heads used prevent At test time, formulate policy selection multi-armed bandit problem, it possible select best unknown feedback environment. use algorithms allows constructively re-use continually policies at times during episode. multiple RL environments fail, able achieve close optimal sequentially.
منابع مشابه
Continual Reinforcement Learning with Complex Synapses
Unlike humans, who are capable of continual learning over their lifetimes, artificial neural networks have long been known to suffer from a phenomenon known as catastrophic forgetting, whereby new learning can lead to abrupt erasure of previously acquired knowledge. Whereas in a neural network the parameters are typically modelled as scalar values, an individual synapse in the brain comprises a...
متن کاملReinforcement Learning without an Explicit Terminal State
| The article introduces a reinforcement learning framework based on dynamic programming for a class of control problems, where no explicit terminal state exists. This situation especially occurs in the context of technical process control: the control task is not terminated once a predeened target value is reached, but instead the controller has to continue to control the system in order to av...
متن کاملTuning Continual Exploration in Reinforcement Learning
This paper presents a model allowing to tune continual exploration in an optimal way by integrating exploration and exploitation in a common framework. It first quantifies exploration by defining the degree of exploration of a state as the entropy of the probability distribution for choosing an admissible action. Then, the exploration/exploitation tradeoff is formulated as a global optimization...
متن کاملTask-Oriented Reinforcement Learning
Acknowledgement This thesis is the result of two years of work whereby I have been accompanied and supported by many people. I am extremely indebted to Dr.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i7.20674